Skip to content

Conversation

@TheBlueMatt
Copy link
Collaborator

When MonitorUpdateCompletionActions were added, we didn't
consider the case of a duplicate claim during normal HTLC
processing (as the handling only had an if let rather than a
match, which made the branch easy to miss). This can lead to a
channel freezing indefinitely if an HTLC is claimed (without a
commitment_signed), the peer disconnects, and then the HTLC is
claimed again, leading to a never-completing
MonitorUpdateCompletionAction.

The fix is simple - if we get back an
UpdateFulfillCommitFetch::DuplicateClaim when claiming from the
inbound edge, immediately unlock the outbound edge channel with a
new MonitorUpdateCompletionAction::FreeOtherChannelImmediately.

@TheBlueMatt TheBlueMatt added this to the 0.0.118 milestone Oct 13, 2023
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First high-level pass.

@TheBlueMatt TheBlueMatt force-pushed the 2023-10-dup-claim-chan-hang branch from 1792bef to 9c487f4 Compare October 16, 2023 15:25
@codecov-commenter
Copy link

codecov-commenter commented Oct 16, 2023

Codecov Report

Attention: 29 lines in your changes are missing coverage. Please review.

Comparison is base (6cafba9) 89.00% compared to head (5b71cd9) 89.63%.
Report is 22 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2661      +/-   ##
==========================================
+ Coverage   89.00%   89.63%   +0.63%     
==========================================
  Files         112      112              
  Lines       87207    91365    +4158     
  Branches    87207    91365    +4158     
==========================================
+ Hits        77619    81897    +4278     
+ Misses       7353     7231     -122     
- Partials     2235     2237       +2     
Files Coverage Δ
lightning/src/ln/chanmon_update_fail_tests.rs 97.71% <98.38%> (+0.01%) ⬆️
lightning/src/ln/channelmanager.rs 86.13% <83.62%> (+4.55%) ⬆️

... and 20 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still going through a first pass

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably ready for a 2nd reviewer

Comment on lines 5587 to 5590
let fee_earned_msat = if let Some(claimed_htlc_value) = htlc_claim_value_msat {
Some(claimed_htlc_value - forwarded_htlc_value)
} else { None };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this to where the event is generated below. Can we also stop gating this whole thing on if let Some(forwarded_htlc_value) .. or add a comment for why we're doing so?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, its fine cause we only have None when claiming from a(n old) monitor, which we dont have to restore, but I'll move it, good idea.

@valentinewallace
Copy link
Contributor

Feel free to squash.

@TheBlueMatt TheBlueMatt force-pushed the 2023-10-dup-claim-chan-hang branch from 6ab8b00 to 80792aa Compare October 18, 2023 19:02
@TheBlueMatt
Copy link
Collaborator Author

Pushed a number of further changes so didn't squash yet.

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real feedback, LGTM after a 2nd reviewer.

@TheBlueMatt TheBlueMatt force-pushed the 2023-10-dup-claim-chan-hang branch from 80792aa to a6d4676 Compare October 18, 2023 20:33
@TheBlueMatt
Copy link
Collaborator Author

Squashed with one additional assertion and a comment fix:

$ git diff-tree -U3 80792aab a6d4676c
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index 217922ca5..b12fc3c86 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -5617,10 +5617,21 @@ where
 								// There should be a `BackgroundEvent` pending...
 								assert!(background_events.iter().any(|ev| {
 									match ev {
-										// to apply a monitor update that blocked channel,
+										// to apply a monitor update that blocked the claiming channel,
 										BackgroundEvent::MonitorUpdateRegeneratedOnStartup {
-											funding_txo, ..
-										} => *funding_txo == claiming_chan_funding_outpoint,
+											funding_txo, update, ..
+										} => {
+											if *funding_txo == claiming_chan_funding_outpoint {
+												assert!(update.updates.iter().any(|upd|
+													if let ChannelMonitorUpdateStep::PaymentPreimage {
+														payment_preimage: update_preimage
+													} = upd {
+														payment_preimage == *update_preimage
+													} else { false }
+												), "{:?}", update);
+												true
+											} else { false }
+										},
 										// or the channel we'd unblock is already closed,
 										BackgroundEvent::ClosedMonitorUpdateRegeneratedOnStartup((funding_txo, ..))
 											=> *funding_txo == next_channel_outpoint,
$ 

Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to squash.

This may help in debugging blocking actions in the future.
While we'd previously avoided this, this is sadly now required in
the next commit.
When `MonitorUpdateCompletionAction`s were added, we didn't
consider the case of a duplicate claim during normal HTLC
processing (as the handling only had an `if let` rather than a
`match`, which made the branch easy to miss). This can lead to a
channel freezing indefinitely if an HTLC is claimed (without a
`commitment_signed`), the peer disconnects, and then the HTLC is
claimed again, leading to a never-completing
`MonitorUpdateCompletionAction`.

The fix is simple - if we get back an
`UpdateFulfillCommitFetch::DuplicateClaim` when claiming from the
inbound edge, immediately unlock the outbound edge channel with a
new `MonitorUpdateCompletionAction::FreeOtherChannelImmediately`.

Here we add the new variant, which we start generating in the next
commit.
When `MonitorUpdateCompletionAction`s were added, we didn't
consider the case of a duplicate claim during normal HTLC
processing (as the handling only had an `if let` rather than a
`match`, which made the branch easy to miss). This can lead to a
channel freezing indefinitely if an HTLC is claimed (without a
`commitment_signed`), the peer disconnects, and then the HTLC is
claimed again, leading to a never-completing
`MonitorUpdateCompletionAction`.

The fix is simple - if we get back an
`UpdateFulfillCommitFetch::DuplicateClaim` when claiming from the
inbound edge, immediately unlock the outbound edge channel with a
new `MonitorUpdateCompletionAction::FreeOtherChannelImmediately`.

Here we implement this fix by actually generating the new variant
when a claim is duplicative.
@TheBlueMatt TheBlueMatt force-pushed the 2023-10-dup-claim-chan-hang branch from 5b71cd9 to f47270e Compare October 19, 2023 15:28
@TheBlueMatt
Copy link
Collaborator Author

Squashed with a small wording tweak in the log:

$ git diff-tree -U1 5b71cd9a f47270e7
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index d66b6c478..1a4bdfbf6 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -6542,3 +6542,3 @@ where
 							log_trace!(self.logger,
-								"Holding the next revoke_and_ack from {} until the preimage is durably in the inbound edge's ChannelMonitor",
+								"Holding the next revoke_and_ack from {} until the preimage is durably persisted in the inbound edge's ChannelMonitor",
 								msg.channel_id);
@@ -10130,3 +10130,3 @@ where
 								log_trace!(args.logger,
-									"Holding the next revoke_and_ack from {} until the preimage is durably in the inbound edge's ChannelMonitor",
+									"Holding the next revoke_and_ack from {} until the preimage is durably persisted in the inbound edge's ChannelMonitor",
 									blocked_channel_outpoint.to_channel_id());
$ 

@TheBlueMatt TheBlueMatt merged commit d7a6d0d into lightningdevkit:main Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants